Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node timing #747

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Node timing #747

wants to merge 10 commits into from

Conversation

paigerube14
Copy link
Collaborator

@paigerube14 paigerube14 commented Jan 14, 2025

This is adding in the ability to add AffectedNode timing to a list that we can track in the telemetry and other output.

This tracks the amount of time that is taken for the cloud provider to stop/start the node, and the amount of time after the node from the cloud side is stopped/started what is the time the node is in not ready/ready state.

Needs to go in after krkn-chaos/krkn-lib#143

Any suggestions on how to make this not touch as many files?

New telemetry section will look like this for a stop/start scenario

          "affected_nodes": [
                    {
                        "node_name": "ip-*.us-east-2.compute.internal",
                        "not_ready_time": 0.0,
                        "ready_time": 24.439035892486572,
                        "unknown_time": 0.15461111068725586,
                        "stopped_time": 136.35135626792908,
                        "running_time": 15.249454021453857,
                        "terminating_time": 0.0
                    },
                    {
                        "node_name": "ip-*.us-east-2.compute.internal",
                        "not_ready_time": 0.0,
                        "ready_time": 23.62007188796997,
                        "unknown_time": 0.15206694602966309,
                        "stopped_time": 166.52288508415222,
                        "running_time": 15.330392122268677,
                        "terminating_time": 0.0
                    }
                ],

Copy link
Collaborator

@chaitanyaenr chaitanyaenr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested different combinations and the reporting is working as expected. Working node scenario configs:

  • one action - node_stop_start_scenario targeting one instance count
  • one action - node_stop_start_scenario targeting multiple instance count

Scenario with multiple actions is failing to report the metrics due to a bug in the node-scenarios code base outside this PR: #749

@@ -247,6 +253,8 @@ def run_node(self, single_node, node_scenario_object, action, node_scenario):
"There is no node action that matches %s, skipping scenario"
% action
)
logging.info('last line run node')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover logging from the test runs, need to remove it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants